DOCOHlilT BESOEE 



B0 185 077 

AUTHOR 
I!irLE ► 



PUB DATE 
NOTE 

EDRS P^ICE 
DESCKIPIORS 



IDSNTIFI5RS 



Barnett , . 
The Inalv 
Biliti.ga al 
Language 
Doroiaance 

:.-79] 

MF01/PCO1 
Bilin'^'j al 
ESucatioR 
♦Language 
Interpret 
-> Validit-v: 
Language 



- TH 800 068 

^\ 

Nancy ^ ' • * - 

sis of Technipal ..VaJ-iaity and 'Reliabilit y ia 
language Assessment Instniments^ The 

Assianmeti't-Onipire (L.A.U.) "Language , 
Test. - ' .' * ' " 

■ ' • " ■ W ■ 

\ ' . • . ■ 



Plus Postage. . ^ 
Fducat^on; *Bilingual Students: Eleieac 
*Lanoaacie Doainance: Language Fiuency: 
Tests: *Sp-inish Spaaking: Test Bias: Ts 

ation; Test Beliabi-lity; ^Test Reviews: * 

- *Verbal Yests 

Assianment Umpire \ 



St , 
last 



ABSTRACT . " • - 

Technigues fox assessing test validity and 
reliability are applied to an analysis of an unpublished test, in *. 
order to familiarize test users in local bilingual programs with ths' 
technical evidence tKat should be aviilabls for instruioents of 
po.tentiail use in placing limited Bngl ish-speaking students. Th6 
instrument studied, the Language Assignment Umpire (L.A.O.jV'is 
designei to identify lan.auage dominance by maan's of four oral; tasks 
of sentance caeBiory, synonvnss, SL^.tonysns, and digit* reversa*^. The. 
validity of the L,ft,0. is consifiered in determining language 
prof iclanc; y ^as wail as langua-ge dominance. Lexical difficulty 
^counts of phonemes, syllabiew, words, and morphemes are .examined for 
a linguistic analysis of the L.A.U- The sentence^^eroory task -is 
examined for the syntactic 'comple xity of its items. Statistical , 
analyses^ are reported for a variety of correlations at "^oth" the 
inti:-a- test, and external criteria levels. >A brief discussion is* , 
incluiei cJf the Results of a study ii which t>.e L.A.U, and oxher » 
language data were used to d'etermine the ef f ecti veness-'Of the 
Kochester, Hew York bilin<iual ( orooram. (Author/CTH) 



* Reproductions supplied by EDPS are the best that can be made * 

* from the' original document. . ' * 
jMc************'** ********************* ********* 



, "PERMISSION TO RERROOUCE TH!§ 
MATERIAL HAS BEEN GRANTED BY 

TO THE EDUCATIONAL RESOURCES 
INFORMATION CENTER (ERICV 



The Analysis of Technical Validit:/ and Reliability in Bilingual 
Languag'e Assessment: Instruments: The Language Assignment 
Umpira (L;A*U.) Language Dominance Test 



Hancy Barnett 

Universitiv or Rochester 



y maj or ^ concern in the current issue of che identification of 
bilingual education needs is that of the technical quality of -evaluative 
in^struinents -Chat are being used in the assessment of bilingual languagV 
proficiency and dominance* Determining the quality of such instruments 

crucial, especially in Chat they are often relied upon as both . 
entrance and exit criteria for bilingual and ESL programs* The follow- 
ing analysis of one such instrument ma^**serve as an example of some of. 
th.e types of evidence of test reliability and validity that should be 
looked, £or by test users< when selecting appropriate instruments* 



The cJurrent litexature 



in languag]^ evaluation verifies . many of the 
^^abuses, misuses, and malpractices^* identified by ?eua and Bemal (1978). 
These include, to mention a few, invalid practices in tes-t translation, 
test importation, inappropriate addicion of points, little or no report- 
ing of technical data, etc* In addition to these ^v^e see at times the use 
•of valid measures in ^ge groups for which they are not valid, unrealistic 
training or equipment required^ for proper administration of the instru- 
ment , cest reliability data presented as evidence of overall test^ valid''- 
ity, ox excellent reporting of technical data in a statistical form that 
would be incomprehen'^ibi^e to nicst individuals vho might be in positions 
to read Lt^or interpret , it , In short, this author suggests that an . 
unnecessary '*coCTnuaication gap'^ exists within the spectrum of those 
involved in bilingual language ^assessment. This paper addresses primarily 
ctie communication gap between test developers and test users. The 
research Qited in ehis paper is reported for the purpose of ' identixying 
types of evidence that should exist for language assessment instruments. 
The implication is not^that all local test users should conduct inde- 
pendent evaluations of ^instruments before purchasing them/ but rather 
that users be sure- to purchase only instruments that have 'been subi^ct 
to several t^^-pes of reliability;- and validity checks* ^ , - 

^For this author's experience in test evaluation , ^ as t^ell* as for * 
parir\of a needed experiment in trie bilingual program of the Rochester- 
city schools, an instrument was selecced for analysis that was as new as 
possible. This was done to avoid influence of other assessments chat 
might have been done of an older instrument* The author also chose to 
work with an instrument that made use of ^mtrre than, one testing , technique, 
and that did not relv unon a lot of rime ft)r administration or exnensive 
materials* The test evaluated was the Language Assignment umpire, which 
was designed by Bernard Cohen in^i975«^ At 'the/time of the study (Spring 
1978), the test was in its f leid-testing phase :in several areas of the 
country* 

r ^ 

2. ■ ■ . / 



( 



For this study, the L.A.U, was adminiscered to 126^ students (64 
boysy^ 62 girls; 61 bom in the United States, 61 bom in Puerto Rico, 
4 bom elsewhere). The table below specifies the grade and program 
categories of the students tested. Three bilingual test administrators 
participated in the collection of data,- all of whom were experiSirced in 
working Vith these age groups and were trained thoroughly in the adcain- 
istration of the L.A.U. ^ % 

STUDENTS ADMINISTERED THE L.A.U, THSX^ING OF 1978 



^ Group 

- - ^ 




Grad^ 

> 






School 


4th 5ch 


6 th , 


7th 


« 8th 


total 


A ^(bilingual program) 


19 . 17 


* 18 






54 


B (experimental; bilingual 
program for five year?) 






22 


' 15 


37 


C (control; Spanish-speaking 
students withvfive years 
in traditional program) 


Total number 


21 

(^ f students 


14 

tasted: 


35 

126 ' 



(Note: For the comparison phase of research ^ a few students in -.Groups B 
and C were omitted becau^se they did not useet the criterion of living in 
the United States for at* least f?lve years, ^ith those students omitted, 
there were 19 seventh graders and 13 eighth/graders in each group.) 



Description oT ^the L«An(J . 

Tne L.A*U. author^ s objective is to measure a ^child's ability to 
receix'e language, utilize the language for^ cognition and produce languag 
Thereto ra, it does not contain separate sections for infoxmation regard- 
ing separate lingui'st4c components of language, i.e., phor^logy, 
morphology, syntax, ^and semantics, or separate sect^ns for the four 
skills area^ of listening, speaking, reading, and writing. Rather it is 
composed of four typies of verbal tasks that, in order to be ^carried out^ 
depend on an overall receptive and active knowledge of the languages 
being tested. Since this original study was completed, the' L.A%U, has 
been revised. Therefore, specific test items or results that have -since 
beety^utdated wiil*not be emphasized,'. .Instead the type of analysis that 
was made of tfte instrument will be outlined. ^ - 

The L-A,*U. contains v6our parts, all of vhi^ch are admiiiis tered first 
in Spanish and then in English* Si:ice thiS research iawlved Spanish/ 
English bilingual students, the noa-Englisfh sect-ion "irihX here ref errs 
to as the Soanish section., >tost descriptions and general comments, hov- 
ever, are applicable as well to other language Sornis or tne test. The 
test is presented orall^ and individually, with the student receiving al 
cues without any written or visual stimuli. Under most conditions, the 
test is completed, withih six to eight minuses* 




- L>A>U. Part I is a sentence memory task, in which che student is 
asked co repeat the phrase or sentence that. the administrator reads 
only once* The sentences are presumably ^ordered ^increasing length 
and dlifxculty^ After three consecutive mistakes > the administrator 
stops and tnoves on to the following section* Sentences are not ^ 
scored as incorrect if there is a pronunciation or intonation difference 
between vhat the adininistrator states and ^^hat the student repeats. ^ 
Neither is a response scored as incorract if there is a syntactic change 
produted by the student that does not affect overall understanding of 
the language and concepts involved.^ * # ' 

Part II contains 28 le^cal it^s for which the student provides * 
synonyms . One w6rd is given ^ and the student is as^ked to give a word . 
tjiat means the same thing* Correct -responses are provided in the test \ 
manual for the administrator^ who stops after two consecutive mistakes'. 

Part III of the L*A*U. is a section in which the student is asked 
to provide an antonym tor the words given* There is' again, in this sec- 
tion as well as in the first two sections, some flexibility of syntactic 
form**^- In this section^ again, the administrator stops questioning and ^ 
moves 'on to the following section after the student has made two con- 
secutive mistakes* 

Part IV contains a digit reversal ^cask. The- student hears a com- 
bination of tpo, three, four, or five nx^mbers.. He hears this combination^ 
only once. As an example, we cite ''3-^2 J' The student then reverses 
the numbers, stating^ in this case, ''2-^8-3/' ThesaV^items are arranged 
in order of difficulty, with" no single digit appearing more than once in 
any combination of numbers. It is terminated afti^r'tvo consecutive 
erroneous responses - 

Scoring of the L*A.G* is done by adding the number of correct^ 
responses per section\ The sections for each language are therJ added, 
and a total score derived for each language- ' ' , 



Reliability 

The first check of the L/A*U* reliability involved the inter-rater 
variability in scoring. A group of 37 students taped by one rater 
du:Cing the administrat^^on of the test. Later, the o^her two raters 
invol^^ed in the study 'each listened to the tapes and rescored' each of the 
37 students* Inter-rater correlations were then determined for separate 
test parts as well as for total scores* 

The second type of reliability to be determined was test/retest * 
^reliability of the L.A.U. A group of 25 students was retested after a 
lapse of six months* Because of the individual nature of language 
improvement, scores 'were not expected to correlate perfectly* However, a 
pa^tterri of general ittc^reBSO. in score was Id^ed for. 

Alternate fona reliability was not determined for the L*A*U, because 
the alternate fona^ -/nich is now available^ vas not available at the time 
of the original testing* 



A split-half mecfeod of checking internal consistency of the L.A.U. 
was •considered. This ^^hod is usually done systemat-ically , e.g., odd- 
nuHberad items in one half ^ and even^-aumfaered items in the , other half* 
Because of the L»A*U. procedure of terminating test* parts af ter the 
student has answefed tvo items ijicorrectly, this method of reliability 
analysis Vas consi^dered to be^ inappropriate* ' ' ^ 

The results of the L,.4.U. reliability tests indicated that there is 
high reliability in the instrument-^ The. slight variability in scoring 
creates no significants-difference in comparing scores of one rater to 
those of another. However, since t^ere^ were, two raters involved in a 
comparison study Which was a separate part of the research, the differ^ 
ence in rating, was exactly determined, and the^^cores of one rater were 
adjusted to account for th(e slight v'ariability • 6 

Validity 

■ ■■»■■ »■ —. . *.■» • » 

The validity of the L.A.U. was examined by a number of procedures. 
The three basic classes of validity as defined by the American Psycho- 
logical Association^ i.e*, content, construct, and concurrent validity, 
would .be determined by statistical and' Unguis bic analyses. 

The first of these, content vali<Uty > being primarily r'ational, jLs 
usually determined by the opinions of experts as to the extent that the 
test is a reasonable sampling of the domain being evaluated* This 
implies judgment of how well the test represents the domain as 'defined 
by the author of the test. Because this work was don^ independently^ on 
a small scale, and not as a formal validation of the instrument, these 
common practices were altered a bit* First of all, the rationale and 
Appropriateness of the sampling were evaluated without consultation with 
a team of experts. Secondly, each L.A-U. part was evaluated for its 
possible validity in measuring proficiency as well as the author's objeo 
tive of identifying language dominange. 

The following six issues were examined for each of the test parts: 

1. The linguistic comnonents sampled in the tasfe;-. 

2. The .type of cognition involved in the task* ^ , ^ 

3. The objectivity in scoring of tfie part. 

4. The appropriateness of item selection'' for the population »^ 
. tested* . ' 

5. The increase^ in dif-ficulty ambnjg items. 

6. The comparability of the English and Spanish sections of the 
>art, 

Part I J since* it is made up of sentences ratber than isolated lexical 
'items, was analyzed in greater detail* The first four items a^e impor- 
ta^nt to be dete^rmined in any instrument. In terms of the L.2\*U*| items 
5 and 6 are crucial. Since the author's objective is to discriminate 
betweeri various st?udent levels by terminating the section after the 
student has reached three items out of his* range of ability/, a very 
gradual increase is necessar;^ to be able to discriminate between student 



of sikllar abilities. The cpnparability of difficulCT of the English 
and Spanish sections is also crucial, especially if language totals are 
compared for interpretation of results. ^ . ^ 

In cbnsidering- issues 5 BXid 6, in the sentence repetition task, 
item counts were taken for numbera of vords , phonemes , taorphemesi and^ 
syllables in each iteni* The exa*qiina£ion of word and morpheme ccuats' 
is of interest to determine whether the progressions from sentence 1 to « 
sentence 17 in Spanish, and from sentence 1 to^ sentence 17 in English, 
are of ' gradually increasing length. The ta^ of sentence repetition 
demands short -^term memory, which is a tunctioti of, among other things, 
the length of the utteranjce to be repeated. All other things being 
equal, it is assumed tjtat the shorter of two^entences will be more 
easily retained 'for subsequent repe^pition. ^ 5>nce all other things are 
never 'equal, and since s;^tactic complexity and lexical familiarity 
appear to^be important factors in shorts-term memory, the relative, 
lengths of sentences must be viewed as but one contributory factor in 
the relative difficult:/ of utterances. The examination of syllable^ 
and phoneme counts is also of interest in considering the progression 
of length within the Spanish section and 'the English section. 

In comparing the Spanish and English sections with one another, the 
word and morpheme counts are of little interest, because of the synthetic 
nature of Spanish and the analytic nature of English^ The syllable and 
phoneme counts, however, are of interest > The syllable count is not a 
completely .accurate means of comparing utterance length in Spanish and 
English. This is due to the much higher incidence in English 'of long 
syllables (CVC, CC/C, CVCC, etc.) than in Spanish, which has a higher 
percentage of CV syllables (Delattre 1965:41), Combined with the pho- 
neme count, however/ it ser^zes to^ give a fairly good picture of relative 
utterance lengthy in Spanish ^ and English*^ ' * ^ ^ ^ 

^ ^ In al^dition-, for all parts ^ other than the digit reversal fask, the 
frequency of usage of- the variou^ lexical items' was considejred. Word 
frequency lists are typically based on adult samples of written language. 
Adult vocabularies c^n contain i^exical items' that vary cc^nsiderably friDm 
the more commonly used lexical items ^ by childrem* However, since there ^ 
are no lexical frequency lists available in English and Spanish for the 
ages involved in this research, th,a adult lists were used as the only ^ 
criterion available,-''*-^ 

Although all of txhese linguistic components play some role in the 
relative difficulty of items, the factor of greater signif ican<::e in the 
case of Part I is that of syntax* In an analysis of this .factor, each 
L.A*b\ item was 'examined for th§^ surface complexity of its syntax and 
t/ariety of verb tenses Although it is possible to analyze the syntactic 
structure of these "sentences id the greatest detail of their deep struc- - 
ture, rules, and transformations required for their production, this type 
of complex analysis was unnecessary to satisfy the objectives of this 
re-search, • ^ ^ 

The following t^bie illustrates the type of syntactic description 
done for each Part I item: 



L.A.U. PART I. SENTEJICE >1EM0RY 



Item ^ Tens.e > Syntax 

% ■ " 

P^resent , - NP have bTP ^ 

* . pi. 

Present' , J?P V NP^ , (V OT> 

b cotnplement 

•Present . . ■ N ! aux-V Adv-P. Adv-P^ 

pi, ' loc tamp. 

Present ^ NP be Adv Adi-(NP , do-lng NP 

temp S pi pi 

Present N aux-V NP- v ' \br ^ Adv-P[PP3, ) 

S complement loc 

Pres. 'progr. - NP , aux-V Adv-P, Adv!»-P(PP], ) 
. . pi loc O loc' 

Pres. progr. ' • NP^^ and N ^ (^^P V Adv-P[PP]^^^ ) 

S relative 

Present , aux-V NP Ad\'-?(PP] thaC 

• pi , temp 

' , ^(W V PP) 

subordinate (adverbial) 



The English and Spanish sections were found to be roughly comparable in 
the occurrence of aukiliaries , embeddings, and modifiers. The Spanish 
section dt the time, however, contained magi^-acTl^erb tenses than did 
the English section. ' * 

The reader is reminded than in addressing issties 5 and 6 in Parr I, 
no single count \g any ope of these, linguistic factors can be -relied upon 
solely as evidence of item difficulty, or of the comparability of the 
English* and Spanish sections* Ho^e^er, it could be said that an itetn 
^^as inappropriatelv introduced' if it sh<^ed a sudden increase of diffi- 
culty in several of these factors simultaneously. 

As previously mentioned, the L*A*tI. procedure of terminating testing 
' ^ of parts after a certain number of student errors calls for very gradual 
increase of difficulty. If one item is inappropriately placed.^ no great 
problem arises* However, if tvo inappropriately difficult icems (three 
' Items in Part I) occur consecutively, then very often a vail is created 
that fev students surpass,. This is undesirable^ in that such a**Vall 
effect" terminates the section for students v?ho may not be similar at all 
in proficiency of the particular language. If such a ^^aii occurs In early 
items ^ this *is especially serious* For comparability purposes, trails at 
different^ points , i-e,, a Part I English vail otcurriag in items 10, 11, 
and 12 and a*Spanish ^vail occurring on items 5, 6, and 7 would ^e espe- 
cially serious, in that they vould create radically different scores for 
students who are actually equally proficient in both^ languages * The 
L.A.U* .method of totaling the number of responses rather than assigning 
the last item ansx^/ered correctly as the score is effective in minimising 
the effect of such walls. 



ERIC 



7 



To det-ermine construct validity ^ one intra- test method carried ' 
6ut was that of correlating test parts to each other* Of the 126 students 
who* were administered the L'.AIU,, 13 were known to be "balanced'' bilingtials 
This* was determined if both the student and all five of his or her teachers 
agreed on the student bilinguality * Although rhe number is small ^ tixe 
English and Spanish parts could be statistically correlated. If the 
English and th^* Spanish sections were in fact comparable in difficult^, 
then , hi^Mcorrelations and siiaxlfr ranges of responses would be expected* 
There we!^ high correlations inx^this case, although they were misleadins* 
The English total scores were in the r^nge-of 7-61 qprrect responses, 
while the Spanish total scores were in the range of 7-47, The high co,rrer- 
lations, indicating a pattern"' of lower Spanish scores than English scores 
for students who are balanced* bilinguals , danonstratfed that revision would 
have to be done iii making the Spanish section and the English section more 
comparable, . 

Criterion-related validity is concurrent if the two measures are 
adminfstered at roughly the same time, and predictive if the measure 
being validated is correlated with scores of a measure thar is adminis- 
tered after a^ time lapse. Concurrent validity of the English section of 
the L.A.U. was determined with' data, available from .reading scpres^of the 
Metropolitan Achievement Test. Predictive validity was determined with 
the English section of the Language Assessment Battery^ which was adminis- 
tered after a time lapse of si:x months. 

Findings 

I II iKim II I , ^ J * 

t 

Hany of the results^ of these validity studies are both extensive to 
relate and in part o;at dated due to recent L.AJJ, item revision* In 
general, however, it can be said that at the time, the I^A-C. item selec- 
tion was more appropriate in its English section than irji its Spanish 
section* The English section, as' illustrated on Figure 1, was found to 
identify students who performed either v^ry well or very poorly, but it 
tended to inflate ^ the sTcores of some students who were in- the mid-range 
of abilities* A small noming sample tentatively suggested chat the L*A*U, 
Spanish scores were 1^1*0 pointy, below the English scores for monolingual 
students of each language. A further finding was that the validity of the 
English section decreased'as student age increased. ^ 

In terms of >he validity of the L.A*.U. parts, the following conclu- 
sions were drawn:- 

The sentence memory task was found to be a valid measure of English 
proficiency, especially for the younger (4th and 5th grade) students, 
correlating in the range of .750 - .871 with the external criteria of the 
reading section of the Hetropolitan Achievement Test and the" English 
section of the Language Assessment Batter;/* Althcygh the English and 
Spanish sections were aot cotftparafale at the time in sentence length, 
lexical difficulty, and ^jse of verb tenses, the c6|iclusion was drawn that 
the technique of sentence uiemory can be valid for identifying language * 
dominance and' language proficiency* ' 



Figure 1./ L.A.U. English piid-r^ge l^inf la tioti. " . ' ^ . 

The synonym task requiring botiv^vord H^pwledga and semantic process- 
ing did a^oc/coiTeiaW well with ^ext^i^^l clW^eriay'^Tth correlations 
3agge,sting that the English' Iten^ ^4lee;t$on PiBb^ easy -and. that the^ 
apanish item selection 4s, too -di^f ficultf. ; tlie' formatrion of synonymy is a - 
skill that, has been found to be Jiff i cult for bilingual children. It has 
been shdwn^ that bilingual^ chil44en.have greater flexibility i:^ thought^ 
than monolingual children (Lan^ert .and Tucker, 1972). Further evidence 
(Ben--Zeev, 1975) indicates th4t bilijiguals are more aware of 'fine detai^ls 
in classifying words into cfftegoties than monolinguals * . Thi^ acute 
sense of distinction may a^/count 'for the bilingual students/ hesitation 
^to respond to items with .in^/thing but an exact synonym. The acceptable 
L*A.G* ^Vest^onses to the f teiD 'babyJ^ ( 'kid/' 'child,,^ and *Unf ant ' ) >would ' 
perhaps not b^ acceptable to many bilingual children since *^kid* and* 
^child* are not^ perfeSt^y synonymous with "baby-^ ^ ^ - 

These observations are hot meant to suggest that synonym tas^ should, 
not be used with bilingual students*^ However, it is necessary . that th6 
items selected for use in this task have very closely associated sgjp^ilij^.; 
and not simply related items. The ^ small-- tiny , ' ' lindo-bonlto l^.sfnppri^^^ 
more appropriate than ar,e the 'stove-oven/ ' ver-^mirar related-^^item 
types . ' . * ♦ * 



^ \ The L.A.V. antonym t'aWk coif related very well vith external criteria, 
and ti>e conciusion'-was drawn that the technique can a valid measur^e . of 
.determining language proficiency/ At the time, the Spanish section .con- 
tained more cotmion^y: used: items "than did the English section, and "^che 
sections are apt comple^tely 'coraparajble for the identification, of' language ^ 
dominance ♦ ^ . ^ , - ^ \ * ^ 

The -cognition o£ the production ^tonyms is similar to the ^cogni- 
tion involved- in the' prdduttion of synonyms. The difference is that the 
production of antoiiyms is ^^easiey/'^ In the' administration^ of these tasks 
in the LiA.U, , Vygdtsky's cl'aim (1962:88) that '^t he child _is more aware of 
differences thaa of ^similarities' was found to be strikingly true. Many 
students needed several pr;:actices Before understanding what was expected 
of tnem in the synphyms section, but they seemed to understand and produce 
bpposites with luuCh ^greater ease* TMs is due, in pajrt, to; the fact that 
synonyms' can be. represented in* only on? way^ whereas antonyms can take the 
form of either copitradicrories, contraries ^ or converses, 11 Therefore, 
the individojar i^ p^'en to mahy more possibilities for one cue in the pro-^ 
duction of antonyms** This may 'suggesi^that th§ antonym' task is more 
appropriate forryolinger childr'en than is the synonym task. 

^ * The L.A.O- digit reversal task correlated poorly with the various 
levels of the MET and ^the LAB (.169 to .445). T]jese low ^ correlations 
indicate, as "^expe^^ted, that the task of digit reversal is only partially 
an indication of a student's command of language. Digit reversal^ 'in 
face, measures somithing more than command ^of language. ^ The ta3.k is 
partfially * of the type^ that* has^ been 'labeled as a "skill at auditory 
organization of verbal material/*' a skill which bilinguals have been 
found to perform better than monoli^guals .(S^^-Zeev, 1075) * It is also 
related to re^^;^sal shifts tas^ which determine an ability to classify 
and reclassify data* Although it can ^say very lif^tle about, relative 
l^guage .proficiency > it was found, to |>e useful as a supplement in 
identifying language dominance. . • - 

Ther^ are indeed advant^ages and disadvantages to be found in any type 
of measurement inrstrument. Also^ the careful examination' to which the 
*L*A*U. was subjected in this ^^tesearoh would uncover -methodological dis- 
advantages in any instrument* The a^vantage^,' then, of the L*A.O, must 
not be overlooked • 

One ?aerit of the L*A*Us is tftat it incorporates four techniques int9 
one instrument?, thus avoiding the assessment of the language of children 
in^^a single way. 'The instpament^ wi.th some it^m revision^^ is ^culturally' 
fair^and. can easily be transferred* into several languages* It can be 
administered quickly, ra<|uire$ a minimum amount of training for its 
administration'^ and does' not rely on expensisre equipment or materials that 
could significantly increase educational costs* Once tiie L*A-U* has been 
revised, \the results can be easily interpreted for educational purposes* 

The L. AnLK iiifc,bilin^aal education 

* 

Of further interest may be the, fact that the L.A.U. was used as part 
of a comparisq^ study in the ef f ectiyeness of the bilingual program in 
Rochester. The criteria used la the selection of students in this 




comparison study are 'listed in Che cable below* For this study^ the L*A*U, 
English and Spanish scores of Sqanish-^spe^in^ students who hadl been edu- 
cated .bilingually (CrOup B) were compared to scores of Spanish-speaking ^ 
students whd^had been educated solely in English (Group C) . Scores frt>m 
the MetrptfOlitan A^chievemenc Test, the Language Assessment Battery; and a 
language classification identified by the school system vere ^Iso used in 



the coapariscn. The results of the study* aithougf. simplified here, indi- * 
cated that the students educated bilingualiy were aot periprming as we?il In 
English as were the- Spanish-speaking students who are educated in English^ 
and that all Spanish-speaking studeifts in the study were performing slightly 
below monolingual English-speaking students *'of the same ages* The con- 
sistency of the lower English scores of Group' B can be -seen In Figure 1* 
The L-A*U* synonym task was th« part that particularly lowered the scores 
of Group B students* ^ The lower-scores^ most probably occurred because somfe 
01 the items in the synonym section called. for responses * that were related 
lexical items (e.g. , '^store-oven" and ''baby-child") but not 'necessarily 
perfect synon^rms. Although this result appears to deanonstfate a negative 
effect of students who are educated bilingualiy,, it may on^ the contrary 
suggest thaj Spanish-speaking students in traditional English instruction 
do -not develop their bilinguality to the s^e extent as do studi'ents taugl\t' 
in both languages. Of ad4itional interest was the superiority shown by * ^ ^ 
Group B students in the ^digit .reversal Cask in bot^ English and Spanish. 
As expected. Group B students scored considerably higher than Group C 
students on all Spanish parts of the test,; Quartiles of the Spanish parts 
demonstrated that 75%- of Group G' students were 9onsistently at the level 
of the lowest 25% of Group B students. The iresults^ of this study pointed 
to 1) the need for placing greater emphasis on improving the English^ skills 
of all aon-English speaking students, and especially of those eSiudaced . 
bilinguall7, and 2)' some linguistic and possible extra- Unguis tic advan- 
tages of bilingual education. ^ ^ 

I 



CiaTERIA FOR .SELECTION 



d^de (no. 
students) 



Grouo"" 



Criterion 



3 ( e^eri^ie^tal) 


7 


(19) 


Snrolied in school 3 (Rochester, NI) biJ^ingual 




8 


(13) 


program. 








In bilingual education for the last 5 (or more) 






i 


— y,ears. 








Live* in the United States for at least 5 vears. 








Speak Spanish since childhood. 


C (control) 


/ 


(19) 


Enrolled in school C* (Rochester^ NY) traditional 




8 


(13) , 


Dros:ram. • ' ' 








In traditional English instruction for last 5 








(or more) years 








Live in Uniced States for at least 5 ^rears * 








Speak Spanish slince childhood. « 



ERIC 



• *■ » ' 

The reader is again reminded Chat the L,A,tJ*, since the time of t|ie^ 

reported analyses^ has undergone revision* Also^ the reader is re!B;inded 

that the L*A*D, was in part evaluated for something for which it is not 

intended.; i*e,, its validity in determining language proficiency was con--^ 

side^red as well as its validitv in identifvina language dominance, This^' 

was dbne, in part^ out of this researcher's concern thdt the assumption is 

often ^made that a student is competent in the language in .which* he or she 

is classified as being dominant. The dependence of language dominance on 

language proficiency is an issue that should be further studied*^ Last, 

the reader is reminded that this analysis was conducted, by one researcher, 

and chat additidnal ^evaluative inout is necessary for conclusive assessment 

With these factors in'mincl, then, several commencs can be presented 
regarding language testing, as well as specific suggestions for users of \ 
ian'guage dominance and proficiency tests. In selecting an appropriatei 
instrument, the following factors should be considered: 

1, That" the instrument h^s- been suljject *to several',^ not j^ist one. or 
, ^. two, measures of reliability and *^^alidity * 

2. Tha-t a team of experts, including^ at least ^one linguist, one 
psychologist, one statistician, and one teachei: of bilingual t 
students, has evaluated the testing rationale and item - selection • ^ 

3* *That statistical correlations are provided -fo? test parts as welit 
as totals, and' that correlations are provided for all'ages with 
which the test is to oe us^d* ' " \. 

4, That comparability of the English, and Spanish sections has been ^ 
thoroughly stndied, espe^cially if te§|Pfenterpre Cation Is don§ 

by the simple comparison of English and Spanish scores • 

5. That linguistic analysis of the items is thfJSi€|h, i.e., by ^^^^ 
methods other than the simple counting of words4|^I Syntactic 
complexity and lexical difficulty are especially important, 

5/ That mariy acceptable responses are provided in -the t^st manual so 
^ that administrator subjectivity is minimized -as much as^'^possible* 

7. That;*in addition to. these suggestions, <;riteria such as those 
published by the Northwest Regional Language Laboratory^ be m*de 
known to test users* 

It is necessary that ^oth test users and test developers become more 
aware of each q,ther*'s rightful 'concerns, with this accomplished, new 
instruments can be both te^chnically valid and .usable,' and existing tests 
can be :Bore effectively administered and uadersliood. It is essential that 
test users be aware of the ^^ariety of statistical, linguistic, psychologic- 
cal, and other factors involved in test validation* These improvements in 
.bilingual language assessment are necessary to insure that the linguistic 
abilities and needs of limited 'English--s peaking students "will, be most ^ 
accurately identified and these students placed in programs that \re most 
beneficial to them. " t * 



as 



The author acknow4edges the cooperation of b,oth Bernard Cc^heA and 
the Rochester City Scho©is\in thiS research. 

, *For f^xaPple, features such as ' loss .of **s*' in Puerto Rican speech-^ 
o,r^varxation' of "r^^ and ^^1^^ are nct^ scored . as iiicorrect. 



\ 



M "^r example', deletion ^ of a' single word that does* not alter the ' 
meaning of the sentence. Many students, repeated the sentence ''My books 
^^ej? failing 'out of my desk daring recess^* > without the word *'onJ' 
Th^e responses werfe sdored a'S being correct* ^ ^ . 

The test In its written jPorn^ does not lis*t as acceptable responses 
wortds in different classes* That i^^ adjectives must be matched with 
adjectives , nouns ;>rith nouns, etc, * In the scorinjg> however,' some 
responses were, considered correct even if they wer\not,o£ the same 
class as th*e cue word* For exanrple/ '^triste^^ (sad) was scored as:. a cor-- 
^"ect response for the item ''alegrla*' (hkppiness) • . / . 

. ' ^''High'^ reliability refers to a correlation cbefficlent above ,850, 

^The procedure used, for 'this ^adjustment was that of linear regres- 
sipn. Slopes and intercepts were op^tained from original intar^rater 
scattergramsL.. \ ^ . 

^ For purposes of morpheme count, the complexities of Spanish , ye rb 
toorphblogy were felt^to give an artificially complex picture* Thetefore, 
jfor purposes of morphological counting, a verb was considered as hfivl^rig 

two morphological entities, ^ * 

^- , ^ ^ ^ ^ 

3 * ' - * , . ' ' ^ 

^ " Syllable count in Spanish was based on natural conversational speed. 

For example, the, senten?te **Canta a Marla *^^ would be considered to have, five'^ 

sy fables: ^ c an- 1 aa- ma-^ r i- a . ^ , ' ^ \ / v \ 

- possibly more accurate measure ^pf utterance length, measuring 
recorded readings of th'e utterances, wps not followed , since it was felt 
that any l^nefits that" might .be gained did not justify 'the time and effort 
involved^ * ^ 

^^Lexical frequency information was^obcained for both English and 
Spanish items in tfie same dictionary (Eaton 1940) . 

"^'^ubntradictories exhaust, options on a scale, e.g., male--f emale. 
Contraries do noi^5^K?:haust these o|)t ions', e*g* large-small. ^ Converses ^ 
differ in one component , which switches in argu|iient , 'e*^ . , parent-child** 
See Clark I97>^:422 for details. , ' 



I 



eric; 




' ' * References 

American Psychological Association, 19 74* Standards for --Educational 

and Psych^uLogical Tests. Washington^ D,C,: Asneric&i Psyaholdgical 
Association. 

Barnett-'GarGia^ 'Mancyv 1979 • a study of the assesj^ent of language 

dominanc§-^and biiiagiial education. *[Jnpublish|ed doctoral disserta- 
tion* University qf Rochester* . 

Ben-Zeev, S, 1975: The effect of Spanish-English *biliagualism\in' 

'children from less • privileged neighborhgods on cognitive deVelopnent 
•.and cognitive strategy ♦ Unpublished r^^earch report to the National 
Institute of Child Health aad Human Development, 

Cohen, "Bernard, 1977. Language Assignment timpire, iJew York: Bernard 
Cohen Research and Development, Inc. 

DeLaf tre , Clerre* 1965* Comparing the Phonetic Features of English, 

French, Germatt, and Spanishi London: George Harrap 5t Co-,*Ltd. 

Eatdn^ Helen 19^40* An English , French,, Gef^nan, Spanish v^ord Frequency 
Dictionary. New Ybrk: Dover Publications > Inc. ] 

Lambert^ and G. Tucker. 19 72. Siliiignal Education of Children — The 
* St* Lambert Experiment, Rc^ley, Mass*^: Mewbury House, n 

Pena, Alban A. and Ernest Bemal, Jr, 1978, Malpractices in Language 
Assessment for Hispanic Children. Occasional Papers on Linguistics. 
105-^108. ^ ^ ^ ^ ^ ' ; ^ 

Silyermarf, Robert J.^ Jo^lyn K.. Moa^'and Randall H. Russell. 1977, Oral 
V^anguage Tests for Bilingual Students. Portland; Or^on: Hprthwest 
Regional Educational Laboratory ♦ X 

Vygctsky, Le^i S.enienovich . 1962". Thought and Language. Cambridge, Mass. : 
■ HIT Press. • .. ■- . 



